160 research outputs found

    The importance of better models in stochastic optimization

    Full text link
    Standard stochastic optimization methods are brittle, sensitive to stepsize choices and other algorithmic parameters, and they exhibit instability outside of well-behaved families of objectives. To address these challenges, we investigate models for stochastic minimization and learning problems that exhibit better robustness to problem families and algorithmic parameters. With appropriately accurate models---which we call the aProx family---stochastic methods can be made stable, provably convergent and asymptotically optimal; even modeling that the objective is nonnegative is sufficient for this stability. We extend these results beyond convexity to weakly convex objectives, which include compositions of convex losses with smooth functions common in modern machine learning applications. We highlight the importance of robustness and accurate modeling with a careful experimental evaluation of convergence time and algorithm sensitivity

    Mean Estimation from Adaptive One-bit Measurements

    Full text link
    We consider the problem of estimating the mean of a normal distribution under the following constraint: the estimator can access only a single bit from each sample from this distribution. We study the squared error risk in this estimation as a function of the number of samples and one-bit measurements nn. We consider an adaptive estimation setting where the single-bit sent at step nn is a function of both the new sample and the previous nβˆ’1n-1 acquired bits. For this setting, we show that no estimator can attain asymptotic mean squared error smaller than Ο€/(2n)+O(nβˆ’2)\pi/(2n)+O(n^{-2}) times the variance. In other words, one-bit restriction increases the number of samples required for a prescribed accuracy of estimation by a factor of at least Ο€/2\pi/2 compared to the unrestricted case. In addition, we provide an explicit estimator that attains this asymptotic error, showing that, rather surprisingly, only Ο€/2\pi/2 times more samples are required in order to attain estimation performance equivalent to the unrestricted case

    Mean Estimation from One-Bit Measurements

    Full text link
    We consider the problem of estimating the mean of a symmetric log-concave distribution under the constraint that only a single bit per sample from this distribution is available to the estimator. We study the mean squared error as a function of the sample size (and hence the number of bits). We consider three settings: first, a centralized setting, where an encoder may release nn bits given a sample of size nn, and for which there is no asymptotic penalty for quantization; second, an adaptive setting in which each bit is a function of the current observation and previously recorded bits, where we show that the optimal relative efficiency compared to the sample mean is precisely the efficiency of the median; lastly, we show that in a distributed setting where each bit is only a function of a local sample, no estimator can achieve optimal efficiency uniformly over the parameter space. We additionally complement our results in the adaptive setting by showing that \emph{one} round of adaptivity is sufficient to achieve optimal mean-square error

    Distributed Delayed Stochastic Optimization

    Full text link
    We analyze the convergence of gradient-based optimization algorithms that base their updates on delayed stochastic gradient information. The main application of our results is to the development of gradient-based distributed optimization algorithms where a master node performs parameter updates while worker nodes compute stochastic gradients based on local information in parallel, which may give rise to delays due to asynchrony. We take motivation from statistical problems where the size of the data is so large that it cannot fit on one computer; with the advent of huge datasets in biology, astronomy, and the internet, such problems are now common. Our main contribution is to show that for smooth stochastic problems, the delays are asymptotically negligible and we can achieve order-optimal convergence results. In application to distributed optimization, we develop procedures that overcome communication bottlenecks and synchronization requirements. We show nn-node architectures whose optimization error in stochastic problems---in spite of asynchronous delays---scales asymptotically as \order(1 / \sqrt{nT}) after TT iterations. This rate is known to be optimal for a distributed system with nn nodes even in the absence of delays. We additionally complement our theoretical results with numerical experiments on a statistical machine learning task.Comment: 27 pages, 4 figure
    • …
    corecore